Measuring Dependence Powerfully and Equitably

نویسندگان

  • Yakir Reshef
  • David N. Reshef
  • Hilary Finucane
  • Pardis Sabeti
  • Michael Mitzenmacher
چکیده

For high-dimensional data sets, it is common to evaluate a measure of dependence onevery variable pair and retain the highest-scoring pairs for follow-up. If the statistic usedsystematically assigns higher scores to some relationship types (e.g., linear, exponential,etc.) over others, important relationships may be overlooked because of their type. Thisdifficulty is avoided if the statistic is equitable [1], i.e., if, for some measure of noise, itassigns similar scores to equally noisy relationships regardless of relationship type.In this paper, we introduce and characterize a population measure of dependence calledMIC∗. We show three ways that MIC∗ can be viewed: as the population value of MIC, ahighly equitable statistic from [2]; as a canonical “smoothing” of mutual information; and asthe supremum of an infinite sequence defined in terms of optimal one-dimensional partitionsof the marginals of the joint distribution in question. Based on this theory, we introducean efficient algorithm for computing MIC∗ from the density of a pair of random variables,and we define a new consistent estimator MICe for MIC∗ that is efficiently computable.(In contrast, there is no known polynomial-time algorithm for computing MIC.) We showthrough simulations that MICe has better bias-variance properties than MIC, and that ithas high equitability with respect to R on a set of functional relationships.Traditional data exploration focuses also on the power of using a statistic to test a nullhypothesis of statistical independence. While MICe is designed for equitability rather thanindependence testing, we introduce a related statistic, TICe, that is a trivial side-productof the computation of MICe. We prove the consistency of independence testing based onTICe and show in simulations that this approach achieves excellent power.This paper is accompanied by a companion paper [3] focused on in-depth empirical eval-uation of several leading measures of dependence. That paper shows that MICe and TICeachieve state of the art equitability with respect to R and power against independence,respectively. Taken together, our results show that MICe and TICe are a valuable new pairof tools for exploratory data analysis.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discovering General Multidimensional Associations

When two variables are related by a known function, the coefficient of determination (denoted R2) measures the proportion of the total variance in the observations explained by that function. For linear relationships, this is equal to the square of the correlation coefficient, ρ. When the parametric form of the relationship is unknown, however, it is unclear how to estimate the proportion of ex...

متن کامل

Measuring the temperature dependence of refractive index in water phantom for precise monitoring of the absorbed dose by radiation therapy techniques

In this paper, an experimental setup is used to measure the temperature dependence of the refractive index of a water phantom. In radiation calorimetry by laser beams and interferometric systems, the amount of change induced by radiation within the phantom can be accurately measured. Absorption of dose and the resulted change in temperature changes the refractive index within the material. In o...

متن کامل

General Theory of Cycle-Dependence of Total pi-Electron Energy

The theoretical treatment of cycle-effects on total pi-electron energy, mainly elaborated by Nenad Trinajstic and his research group, is re-stated in a general and more formal manner. It enables to envisage several other possible ways of measuring the cycle-effects and points at further directions of research.

متن کامل

Some equitably 3-colorable cycle decompositions of Kv+I

Let G be a graph in which each vertex has been colored using one of k colors, say c1, c2, . . . , ck. If anm-cycle C inGhas ni vertices colored ci, i = 1, 2, . . . , k, and | ni−nj |≤ 1 for every i, j ∈ {1, 2, . . . , k}, then C is equitably k-colored. Anm-cycle decomposition C of a graph G is equitably k-colorable if the vertices of G can be colored so that everym-cycle in C is equitably k-col...

متن کامل

نابرابری‌های فضایی دسترسی به کتابخانه‌های عمومی در کشور

Purpose: As the world rapidly moves into the new era of information and wisdom, the needs of human beings to use books and libraries continue to grow. With this in mind, in addition to increasing public library services, it's important for planners to distribute library facilities equitably across the country. In this paper we will study the distribution of public libraries in the Provinces of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of Machine Learning Research

دوره 17  شماره 

صفحات  -

تاریخ انتشار 2016